Multimodal Emotion Recognition


Multimodal emotion recognition is the process of recognizing emotions from multiple modalities, such as speech, text, and facial expressions.

XEmoGPT: An Explainable Multimodal Emotion Recognition Framework with Cue-Level Perception and Reasoning

Add code
Feb 05, 2026
Viaarxiv icon

Decoupled Hierarchical Distillation for Multimodal Emotion Recognition

Add code
Feb 04, 2026
Viaarxiv icon

A Baseline Multimodal Approach to Emotion Recognition in Conversations

Add code
Jan 31, 2026
Viaarxiv icon

AmbER$^2$: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text

Add code
Jan 25, 2026
Viaarxiv icon

Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding

Add code
Jan 23, 2026
Viaarxiv icon

STARS: Shared-specific Translation and Alignment for missing-modality Remote Sensing Semantic Segmentation

Add code
Jan 24, 2026
Viaarxiv icon

Not all Blends are Equal: The BLEMORE Dataset of Blended Emotion Expressions with Relative Salience Annotations

Add code
Jan 19, 2026
Viaarxiv icon

Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models

Add code
Jan 21, 2026
Viaarxiv icon

A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models

Add code
Jan 12, 2026
Viaarxiv icon

Can Vision-Language Models Understand Construction Workers? An Exploratory Study

Add code
Jan 15, 2026
Viaarxiv icon